Skip to content

blob: removes race condition between ctx cancellation and stream close#167489

Merged
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
kev-cao:blob/remote-ctx-cancelation
Apr 6, 2026
Merged

blob: removes race condition between ctx cancellation and stream close#167489
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
kev-cao:blob/remote-ctx-cancelation

Conversation

@kev-cao
Copy link
Copy Markdown
Contributor

@kev-cao kev-cao commented Apr 3, 2026

Previously, our remote local storage writer would always send a half-close and wait for a response when Close was called, even in the event of a context cancellation. This resulted in a race condition where either the context cancellation or Close could be surfaced to the RPC server first. If the former arrived first, the server would Recv a non-EOF error, indicating a failed write and would cleanup the file. If the latter arrived first, the server would see an EOF error, indicating a successful write and the file would be preserved.

As both a context cancellation and a SendClose will cleanup the connection, this commit teaches the remote writer to only SendClose if the context has not been canceled.

Fixes: #167121

Release note: None

Previously, our remote local storage writer would always send a
half-close and wait for a response when `Close` was called, even in the
event of a context cancellation. This resulted in a race condition where
either the context cancellation or `Close` could be surfaced to the RPC
server first. If the former arrived first, the server would `Recv` a
non-`EOF` error, indicating a failed write and would cleanup the file.
If the latter arrived first, the server would see an `EOF` error,
indicating a successful write and the file would be preserved.

As both a context cancellation and a `SendClose` will cleanup the
connection, this commit teaches the remote writer to only `SendClose` if
the context has not been canceled.

Fixes: cockroachdb#167121

Release note: None
@kev-cao kev-cao requested a review from a team as a code owner April 3, 2026 17:43
@kev-cao kev-cao requested review from andrew-r-thomas and removed request for a team April 3, 2026 17:43
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Apr 3, 2026

😎 Merged successfully - details.

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Apr 3, 2026

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@kev-cao kev-cao requested a review from msbutler April 3, 2026 17:43
Copy link
Copy Markdown
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job tracking this down! I'd clean up the language in the commit message before merging:

  • s/remote local server/remoteClient/r
  • i don't think "half close" is an actual term?

@kev-cao
Copy link
Copy Markdown
Contributor Author

kev-cao commented Apr 6, 2026

@msbutler

  • I was using "remote local storage writer" since without "local storage", it wouldn't be obvious from the commit message what "remoteClient" was referring to.
  • It's a term used in TCP and HTTP/2 connections, the latter of which is what I believe is used in gRPC streams.

@msbutler
Copy link
Copy Markdown
Collaborator

msbutler commented Apr 6, 2026

ah, nevermind. carry on.

@kev-cao
Copy link
Copy Markdown
Contributor Author

kev-cao commented Apr 6, 2026

TFTR!

/trunk merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

backup: TestBackupRestore_FlakyStorage failed

3 participants